On the Design of a High-Performance Adaptive Router for CC-NUMA Multiprocessors

نویسندگان

  • Valentin Puente
  • José-Ángel Gregorio
  • Ramón Beivide
  • Cruz Izu
چکیده

This work presents the design and evaluation of an adaptive packet router aimed at supporting CC-NUMA traffic. We exploit a simple and efficient packet injection mechanism to avoid deadlock, which leads to a fully adaptive routing by employing only three virtual channels. In addition, we selectively use output buffers for implementing the most utilized virtual paths in order to reduce head-of-line blocking. The careful implementation of these features has resulted in a good trade-off between network performance and hardware cost. The outcome of this research is a HighPerformance Adaptive Router (HPAR), which adequately balances the needs of parallel applications: minimal network latency at low loads and high throughput at heavy loads. The paper includes an evaluation process in which HPAR is compared with other adaptive routers using FIFO input buffering, with or without additional virtual channels to reduce head-of-line blocking. This evaluation contemplates both the VLSI costs of each router and their performance under synthetic and real application workloads. To make the comparison fair, all the routers use the same efficient deadlock avoidance mechanism. In all the experiments, HPAR exhibited the best response among all the routers tested. The throughput gains ranged from 10% to 40% in respect to its most direct rival, which employs more hardware resources. Other results shown that HPAR achieves up to 83% of its theoretical maximum throughput under random traffic and up to 70% when running real applications. Moreover, the observed packet latencies were comparable to those exhibited by simpler routers. Therefore, HPAR can be considered as a suitable candidate to implement packet interchange in next generations of CC-NUMA multiprocessors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Evaluation of a Switch Cache Architecture for CC-NUMA Multiprocessors

ÐCache coherent nonuniform memory access (CC-NUMA) multiprocessors provide a scalable design for shared memory. But, they continue to suffer from large remote memory access latencies due to comparatively slow memory technology and large data transfer latencies in the interconnection network. In this paper, we propose a novel hardware caching technique, called switch cache, to improve the remote...

متن کامل

Design and Evaluation of a Switch

Cache coherent non-uniform memory access (CC-NUMA) multiprocessors provide a scal-able design for shared memory but they continue to suuer from large remote memory access latencies due to comparatively slow memory technology and data transfer latencies in the in-terconnection network. In this paper, we propose a novel hardware caching technique, called switch cache, to improve the remote memory...

متن کامل

Shared Memory Multiprocessor Architectures for Software IP Routers

In this paper, we propose new shared memory multiprocessor architectures and evaluate their performance for future Internet Protocol (IP) routers based on Symmetric Multi-Processor (SMP) and Cache Coherent Non-Uniform Memory Access (CC-NUMA) paradigms. We also propose a benchmark application suite, RouterBench, which consists of four categories of applications representing key functions on the ...

متن کامل

Excel-NUMA: Toward Programmability, Simplicity, and High Performance

ÐWhile hardware-coherent scalable shared-memory multiprocessors are relatively easy to program, they still require substantial programming effort to deliver high performance. Specifically, to minimize remote accesses, data must be carefully laid out in memory for locality and application working sets carefully tuned for caches. It has been claimed that this programming effort is less necessary ...

متن کامل

ASCOMA: An Adaptive Hybrid Shared Memory Architecture

Scalable shared memory multiprocessors traditionally use either a cache coherent non uniform memory access CC NUMA or simple cache only memory architecture S COMA memory architecture Recently hybrid architectures that combine aspects of both CC NUMA and S COMA have emerged In this paper we present two improvements over other hybrid architectures The rst improvement is a page allocation algorith...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Parallel Distrib. Syst.

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2003